Preprocessing data

Fisrt we read in the preprocessed data.

sceset <- readRDS("../rds/sce_cancerFT_afterQC.rds")

This dataset is after QC. You can see the QC results. There are 3877 QC-passed cells.

kable(table(sceset$description))
Var1 Freq
bulk 27
cell 3877
doublet 63
empty 18
failed 858

Now we only keeps the QC-passed ones for the following analysis.

sceset <- sceset[,sceset$description == "cell"]

Scale and centred data

Now we start to prepare for the clustering. The clustering was performed with an in-house protocol called clincluster. In this cluster, the cells are clustering based on the condition (fresh, cryopreserved and cultured)

source("clincluster/clincluster_functions.R")
sceset@colData$source2 <- plyr::mapvalues(x = sceset$source, 
                                          from = unique(sceset@colData$source), 
                                          to = c("cryo","fresh","longC", "onC","longC"))
sceset <- PrepareData(sceset, col.for.cluster = "source2", do.scale = T)
## [1] "Normalising the data..."
## [1] "Centre the data"

High variable genes

Next we select the high variance gene based on the average expression and the dispersion. In total, 1258 genes were found as HVGs.

sceset <- HighVarGenes(sceset, verbose = T, mean.high.cutoff = 6, mean.low.cutoff = 0.3, dispersion.low.cutoff = 1, dispersion.high.cutoff = 7.5) 
## [1] "Finding high variance genes..."
## [1] "1258 high variance genes are found."
kable(table(rowData(sceset)$high.var))
Var1 Freq
FALSE 20852
TRUE 1258

The following plot shows the cutoffs for the HVGs selection. In subfigure (A) The x-axis is the average expression and the y-axis is the dispersion of expression scaled by the mean expression. The subfigure (B) is slightly different. the y-axis is the dispersion of expression.

df_plot <- data.frame(gene.mean = rowData(sceset)$gene.mean,
                      gene.dispersion.scaled = rowData(sceset)$gene.dispersion.scaled,
                      gene.dispersion = rowData(sceset)$gene.dispersion,
                      high.var = rowData(sceset)$high.var)
p1 <- ggplot(data = df_plot , aes(x = gene.mean, y = gene.dispersion.scaled, col = high.var) ) +
      geom_point(alpha=0.4) +  geom_hline(yintercept = 1, alpha = 0.5, col = "grey" ) +
      geom_vline(xintercept = 0.3, alpha = 0.5, col = "grey" ) + 
    geom_vline(xintercept = 6, alpha = 0.5, col = "grey")
p2 <- ggplot(data = df_plot, aes(x = gene.mean, y = gene.dispersion, col = high.var) ) +
      geom_point(alpha=0.4)
plot_grid(p1,p2, labels = "AUTO")

Run tSNE

Next we run the tSNE calculation embeded in scater for visualisation by using the HVGs we just found.

set.seed(12345)
sceset <- runTSNE(object = sceset, ncomponents = 2, feature_set = rownames(sceset)[rowData(sceset)$high.var],
                  exprs_values = "logcounts",
                  perplexity = min(50, floor(ncol(sceset)/5)))

This plot shows you how the data looks on a 2d space.

p1 <- plotTSNE(sceset, colour_by = "source")
## Warning: 'add_ticks' is deprecated.
## Use '+ geom_rug(...)' instead.
p2 <- plotTSNE(sceset, colour_by = "Patient2")
## Warning: 'add_ticks' is deprecated.
## Use '+ geom_rug(...)' instead.
plot_grid(p1,p2,labels = "AUTO")

Calculate the first 20 PCs

To reduce the dimensions for clustering, we are calculating 20 PCs from HVGs and the log-transformed counts.

sceset <- runPCA(object = sceset, ncomponents = 20, 
                 exprs_values = "logcounts", rand_seed = 12345,
                 feature_set = rownames(sceset)[rowData(sceset)$high.var == TRUE])
plot(1:20, attr(sceset@reducedDims$PCA, "percentVar")[1:20])

Initial clustering

Cluster each group seperately

There are four groups. We first cluster the cells within each group.

kable(table(sceset$group.for.cluster))
Var1 Freq
cryo 1177
fresh 2132
longC 161
onC 407

The initial clustering is based on knn-spectral clustering.

sceset <- InitialCluster(sceset, k = 6, ncomponents = 1:12, n.neighbor = 7)
## [1] "Got data ready"
## [1] "Initial clustering"
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |================                                                 |  25%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=================================================                |  75%
  |                                                                       
  |=================================================================| 100%
plotTSNE(sceset, colour_by = "initial.cluster")
## Warning: 'add_ticks' is deprecated.
## Use '+ geom_rug(...)' instead.

p1 <- plotTSNE(sceset[,sceset$source2 == "fresh"], colour_by = "initial.cluster")
## Warning: 'add_ticks' is deprecated.
## Use '+ geom_rug(...)' instead.
p2 <- plotTSNE(sceset[,sceset$source2 == "cryo"], colour_by = "initial.cluster")
## Warning: 'add_ticks' is deprecated.
## Use '+ geom_rug(...)' instead.
p3 <- plotTSNE(sceset[,sceset$source2 == "longC"], colour_by = "initial.cluster")
## Warning: 'add_ticks' is deprecated.
## Use '+ geom_rug(...)' instead.
p4 <- plotTSNE(sceset[,sceset$source2 == "onC"], colour_by = "initial.cluster")
## Warning: 'add_ticks' is deprecated.
## Use '+ geom_rug(...)' instead.
cowplot::plot_grid(p1,p2,p3,p4)

Calculating the distance matrix by limma voom

Making DGEList

First we remove lowly expressed genes. 18099 genes are kept for DE analysis

matrix <- expm1(logcounts(sceset))
keep <- rowSums(matrix > 1) > 5
sum(keep)
## [1] 18099
dge <- edgeR::DGEList(counts = matrix[keep,]) # make a edgeR object
rm(matrix,keep)
sceset@colData$initial.cluster <- gsub(pattern = " ", replacement = "_", x = sceset@colData$initial.cluster)
sceset@colData$initial.cluster <- gsub(pattern = "-", replacement = "_", x = sceset@colData$initial.cluster)
design <- model.matrix(~  0 + initial.cluster, data = sceset@colData)  # Use 0 because we do not need intercept for this linear model
colnames(design)
##  [1] "initial.clustercryo.1"  "initial.clustercryo.2" 
##  [3] "initial.clustercryo.3"  "initial.clustercryo.4" 
##  [5] "initial.clustercryo.5"  "initial.clustercryo.6" 
##  [7] "initial.clusterfresh.1" "initial.clusterfresh.2"
##  [9] "initial.clusterfresh.3" "initial.clusterfresh.4"
## [11] "initial.clusterfresh.5" "initial.clusterfresh.6"
## [13] "initial.clusterlongC.1" "initial.clusterlongC.2"
## [15] "initial.clusterlongC.3" "initial.clusterlongC.4"
## [17] "initial.clusterlongC.5" "initial.clusterlongC.6"
## [19] "initial.clusteronC.1"   "initial.clusteronC.2"  
## [21] "initial.clusteronC.3"   "initial.clusteronC.4"  
## [23] "initial.clusteronC.5"   "initial.clusteronC.6"

voom

v <- voom(dge, design, plot = F)
fit <- lmFit(v, design)

initial.clusters <- colnames(design)
nc <- ncol(design)
## Automating makeContrasts call in limma
contrast_all <- gtools::permutations(v = initial.clusters, n = nc, r = 2)
contrast_all <- apply(contrast_all, MARGIN = 1, function(x) return(paste(x[1],"-",x[2], sep = "")))
cont.matrix <- makeContrasts(contrasts = contrast_all,
                             levels=design)

fit2 <- contrasts.fit(fit, cont.matrix) 
fit2 <- eBayes(fit2) 

Distance matrix

n_deg <- matrix(0, ncol = nc, nrow = nc)  # number of DE genes
colnames(n_deg) <- rownames(n_deg) <- gsub(x = colnames(design)[1:nc], pattern = "initial.cluster",replacement = "")

logcount <- logcounts(sceset)[rownames(sceset) %in% rownames(dge),]

for(i in 1:(nc-1)) {
    for(j in (i+1):nc) {
        if(i == j) {
            n_deg[i,j] <- 0
        } else if (j < i) {
            coef_k = (i-1)*(nc-1)+j
        } else if (j > i) {
            coef_k = (i-1)*(nc-1)+j-1
        }
        
        if(i != j) {
            rls <- topTable(fit2, n = Inf, coef = coef_k, sort = "p", lfc = 0.6, p = 0.05 )
            if(nrow(rls) > 1) {
                v_expr <- logcount[match(rownames(rls),rownames(logcount)), sceset$initial.cluster == rownames(n_deg)[i]]
                rls$ratio1 <- rowSums(v_expr > 0.5)/ncol(v_expr)
                v_expr <- logcount[match(rownames(rls),rownames(logcount)), sceset$initial.cluster == colnames(n_deg)[j]]
                rls$ratio2 <- rowSums(v_expr > 0.5)/ncol(v_expr)
                rls$ratiomax <- rowMaxs(as.matrix(rls[,c("ratio1", "ratio2")]))
                rls$ratiomin <- rowMins(as.matrix(rls[,c("ratio1", "ratio2")]))
                rls <- rls[rls$ratiomax > 0.25, ]
                n_deg[i,j] <- sum(apply(rls, MARGIN = 1, function(x) return(abs(x[1]) * (x[9]+0.01)/(x[10]+0.01)))) ## 0.01 is used here to enhance the differences of on-off genes
            } else if (nrow(rls) == 1) {
                n_deg[i,j] <- sum(rls$logFC)
            }
            ## This eqaution take fold change and expression ratio into account
        }
    }
}

n_deg <- n_deg + t(n_deg)

Final cluster

hc <- hclust(as.dist(n_deg))
plot(hc); rect.hclust(hc, k = 12, border = "red")

hc.cluster <- cutree(hc, k = 12)
colData(sceset)$clincluster <- hc.cluster[match(colData(sceset)$initial.cluster, names(hc.cluster))]
colData(sceset)$clincluster <- as.factor(colData(sceset)$clincluster)
table(colData(sceset)$clincluster)
## 
##    1    2    3    4    5    6    7    8    9   10   11   12 
##   73  166  404  155  444  186 1097  890  145   73   73  171
plotTSNE(sceset, colour_by = "clincluster")
## Warning: 'add_ticks' is deprecated.
## Use '+ geom_rug(...)' instead.

Identifying marker genes

matrix <- expm1(logcounts(sceset))
keep <- rowSums(matrix > 1) > 5
dge <- edgeR::DGEList(counts = matrix[keep,]) # make a edgeR object

logcount <- logcounts(sceset)[rownames(sceset) %in% rownames(dge),]
markers <- c()

# pb <- txtProgressBar(min = 0, max =  (length(unique(sceset$clincluster))), style = 3)
for(i in 1:length(unique(sceset$clincluster))){
    info <- rep("control", ncol(sceset))
    info[sceset$clincluster == i] <- "group"
    design <- model.matrix(~ 0 + info)
    v <- voom(dge, design, plot = F)
    fit <- lmFit(v, design) # Linear Model for Series of Arrays
    cont.matrix <- makeContrasts(contrasts = "infogroup-infocontrol",levels=design)
    fit <- contrasts.fit(fit, cont.matrix ) # Linear Model for Series of Arrays
    fit <- eBayes(fit)
    
    marker <- topTable(fit, p.value = 0.05, number = Inf, coef = 1, lfc = 0.6, sort.by = "logFC")
    marker <- marker[marker$logFC > 0.6,]
     
     v_expr <- logcount[match(rownames(marker),rownames(logcount)), info == "group"]
     marker$ratio1 <- rowSums(v_expr > 0.5)/ncol(v_expr)
     v_expr <- logcount[match(rownames(marker),rownames(logcount)),info != "group"]
     marker$ratio2 <- rowSums(v_expr > 0.5)/ncol(v_expr)
                
    marker$gene <- rownames(marker) 
    marker$cluster <- i
    markers  <- rbind(markers, marker)
    # setTxtProgressBar(pb, i)
}
# close(pb)
markers$cluster <- factor(markers$cluster)
top10 <- markers %>% group_by(cluster) %>% top_n(10, logFC)
## Warning: The `printer` argument is deprecated as of rlang 0.3.0.
## This warning is displayed once per session.
kable(top10)
logFC AveExpr t P.Value adj.P.Val B ratio1 ratio2 gene cluster
7.493925 2.615666 17.858006 0 0 144.52501 0.9863014 0.0386435 SRGN 1
7.185128 2.489490 17.843414 0 0 144.31111 0.9041096 0.0084122 PTPRC 1
6.858956 2.535946 15.108411 0 0 102.75485 0.8904110 0.0207676 LAPTM5 1
6.796910 3.642329 15.693876 0 0 111.10035 0.8767123 0.2076761 CXCR4 1
6.165879 2.472423 16.956048 0 0 130.22263 0.8493151 0.0113039 CD53 1
5.855646 2.469413 14.386742 0 0 92.83496 0.7534247 0.0105152 CD96 1
5.416764 3.347757 13.304994 0 0 78.71642 0.8356164 0.1713985 LCP1 1
5.393131 2.499700 11.885707 0 0 61.76532 0.7534247 0.0186646 LSP1 1
5.183070 4.394711 13.543537 0 0 81.71635 0.9178082 0.3845952 ARHGDIB 1
4.992156 3.341753 15.257743 0 0 104.90180 0.9041096 0.2478970 STK17B 1
6.763069 3.501691 20.209524 0 0 183.56233 0.9457831 0.1374293 TFF3 2
5.365099 4.266078 20.635799 0 0 191.30479 0.9638554 0.3435732 GOLM1 2
5.346003 3.639632 16.897211 0 0 128.48156 0.8855422 0.1616815 PIFO 2
5.139665 3.262308 16.772869 0 0 126.64423 0.9879518 0.1126381 C1orf194 2
4.978251 5.165588 16.358361 0 0 120.14656 0.9277108 0.4217192 MAP1B 2
4.910626 3.551296 18.760309 0 0 158.62713 0.9156627 0.2023713 PPIL6 2
4.747680 3.260331 14.960418 0 0 100.19840 0.9337349 0.1005120 FAM183A 2
4.682733 3.279713 14.843187 0 0 98.58025 0.8614458 0.1037456 MS4A8B 2
4.450050 3.206303 14.289734 0 0 91.10446 0.9337349 0.0959310 TMEM190 2
4.418859 3.190226 15.902213 0 0 113.67290 0.8493976 0.1172191 ARMC3 2
3.156161 3.182085 20.805783 0 0 197.00896 0.5321782 0.1238123 GCM1 3
2.303796 5.898201 16.302024 0 0 120.64638 0.7920792 0.7014109 RSRC2 3
2.115385 6.325972 12.415795 0 0 67.98545 0.7178218 0.6677224 CLK1 3
1.671212 5.239914 9.946290 0 0 41.36550 0.5891089 0.5050389 HBP1 3
1.571460 4.342570 9.994565 0 0 41.81005 0.5173267 0.3723006 DDIT3 3
1.520377 3.535779 11.644425 0 0 59.03127 0.4603960 0.2732508 MXD1 3
1.453073 6.663646 8.603633 0 0 29.26617 0.7376238 0.6778002 STIP1 3
1.450149 3.993107 9.392428 0 0 36.11889 0.4257426 0.3031961 KANSL2 3
1.433712 8.256008 13.794616 0 0 85.19311 0.9678218 0.9386697 RBM39 3
1.380918 6.212063 9.652074 0 0 38.57855 0.7277228 0.7293406 GPBP1 3
8.280606 3.501691 24.781825 0 0 272.35608 0.9419355 0.1399785 TFF3 4
7.432450 3.260331 22.944057 0 0 235.18380 0.9935484 0.1004836 FAM183A 4
7.329172 3.262308 23.949950 0 0 255.40654 0.9870968 0.1152606 C1orf194 4
7.205601 4.794764 22.740933 0 0 230.89815 0.9806452 0.3707684 AGR2 4
6.949645 3.206303 21.782556 0 0 212.75857 0.9677419 0.0969909 TMEM190 4
6.845368 3.242445 21.053827 0 0 199.10222 0.9225806 0.1002149 C9orf24 4
6.814381 3.229279 20.613117 0 0 191.02397 0.9419355 0.0956475 C20orf85 4
6.796722 3.279713 21.082596 0 0 199.63412 0.8903226 0.1047824 MS4A8B 4
6.442215 4.799370 21.979126 0 0 216.26256 0.9935484 0.4032778 AGR3 4
6.434406 3.543655 22.013678 0 0 217.17336 0.9354839 0.1794734 RSPH1 4
4.330928 3.934455 23.471224 0 0 249.33460 0.8040541 0.1706962 PLAU 5
4.303066 6.004591 24.369896 0 0 267.94182 0.9707207 0.5219924 ANXA3 5
4.258083 4.457910 26.516998 0 0 314.65169 0.9414414 0.3049811 TNFRSF21 5
4.223105 4.517445 26.482070 0 0 313.86835 0.9527027 0.3233324 CDCP1 5
4.059569 5.414029 22.942299 0 0 238.55217 0.9707207 0.4276143 MARS 5
4.015646 7.841194 22.113260 0 0 222.04860 0.9932432 0.7325954 LAMC2 5
3.972775 4.722760 22.913576 0 0 237.99298 0.8941441 0.3279930 TUBB6 5
3.843969 3.838613 25.343625 0 0 288.84202 0.8445946 0.2175939 TGFA 5
3.792899 5.219056 24.541175 0 0 271.59277 0.9617117 0.4756773 ADAM9 5
3.759404 5.165588 20.356696 0 0 188.79738 0.9324324 0.3801340 MAP1B 5
6.511838 3.133207 26.316379 0 0 309.42243 0.8494624 0.1110810 SERPINE1 6
4.943719 5.995885 14.628581 0 0 96.36281 0.7634409 0.4760228 IL8 6
4.637100 3.611697 17.940146 0 0 146.52447 0.7204301 0.1964237 GEM 6
4.510466 4.165234 19.214042 0 0 168.10209 0.9408602 0.3606069 LGALS1 6
4.155748 3.990554 17.780649 0 0 143.90752 0.8064516 0.3102140 PHLDA1 6
4.042896 5.534019 15.975231 0 0 115.67622 0.8387097 0.5429423 FOSL1 6
3.967744 4.992061 16.245861 0 0 119.74555 0.8387097 0.4584124 MPZL1 6
3.865202 2.732882 14.882151 0 0 99.92355 0.5591398 0.0392847 DCN 6
3.748799 6.652151 15.465753 0 0 108.18656 0.9677419 0.7174208 UBB 6
3.677978 2.677949 13.619427 0 0 82.91723 0.5698925 0.0349499 MEG3 6
4.131843 8.525154 25.670314 0 0 295.82112 0.9453054 0.5845324 FOS 7
4.127485 7.073289 29.612023 0 0 386.50868 0.9288970 0.4798561 NR4A1 7
3.621659 7.930901 29.676534 0 0 388.02761 0.9908842 0.6863309 EGR1 7
3.189548 8.979067 21.394827 0 0 208.17844 0.9307201 0.6787770 CRISP3 7
3.137517 6.986051 27.338675 0 0 333.18502 0.9197812 0.6143885 C1orf63 7
3.042969 8.503664 23.229930 0 0 244.33647 0.9653601 0.6791367 MSLN 7
3.027996 5.597053 28.420278 0 0 358.29761 0.9033728 0.4316547 FOSB 7
2.778039 6.560015 19.815126 0 0 178.97554 0.8012762 0.4881295 CXCL2 7
2.715983 6.139232 22.494649 0 0 229.62501 0.8204193 0.4946043 BTG2 7
2.670104 6.695662 23.356205 0 0 246.95068 0.8869644 0.6046763 NCOA7 7
4.065131 7.073289 26.390556 0 0 311.95681 0.9797753 0.4958152 NR4A1 8
4.023885 8.525154 22.817688 0 0 236.12411 0.9876404 0.5969200 FOS 8
3.930515 6.974367 29.084374 0 0 374.15095 0.9640449 0.5544024 C3 8
3.787296 8.180191 26.128197 0 0 306.10553 0.9719101 0.6548376 HLA-DRA 8
3.772599 6.560015 25.203502 0 0 285.88876 0.9224719 0.4737195 CXCL2 8
3.715153 6.175397 29.730715 0 0 389.68391 0.9494382 0.4974891 ZC3H12A 8
3.681647 8.979067 23.003542 0 0 239.86334 0.9595506 0.6876465 CRISP3 8
3.585745 4.803819 26.901401 0 0 323.49267 0.7370787 0.2597924 CHI3L1 8
3.436071 7.803568 26.646823 0 0 317.68884 0.9719101 0.7037161 HLA-DRB1 8
3.427348 5.929517 24.216766 0 0 264.87824 0.8022472 0.4218279 HLA-DPA1 8
9.503120 3.603942 28.493126 0 0 359.30944 1.0000000 0.1623794 TPPP3 9
8.142990 3.229279 23.648591 0 0 252.64510 1.0000000 0.0956592 C20orf85 9
8.122980 3.100361 23.610057 0 0 251.86028 1.0000000 0.0731511 ZMYND10 9
7.994897 3.639632 23.455852 0 0 248.68118 1.0000000 0.1613076 PIFO 9
7.704140 3.242445 22.614286 0 0 231.74689 0.9724138 0.1004823 C9orf24 9
7.623226 2.959397 22.185386 0 0 223.29851 1.0000000 0.0568060 SNTN 9
7.557494 4.474227 24.482443 0 0 269.96994 1.0000000 0.3494105 CAPS 9
7.539040 2.973098 22.753646 0 0 234.53992 0.9862069 0.0632369 LDLRAD1 9
7.373327 3.260331 21.445322 0 0 208.98802 1.0000000 0.1026259 FAM183A 9
7.146212 3.206303 21.374811 0 0 207.65254 0.9586207 0.0996785 TMEM190 9
6.075956 4.165234 15.430737 0 0 104.81184 1.0000000 0.3767087 LGALS1 10
6.073755 4.020645 13.634885 0 0 81.06973 0.9452055 0.2665615 CCNA1 10
5.948681 5.203586 12.488000 0 0 67.17488 0.9863014 0.4379600 TGM2 10
5.919642 3.419420 13.884339 0 0 84.32173 0.8767123 0.1874343 UCA1 10
5.689158 2.740400 14.781466 0 0 96.23187 0.9863014 0.0686120 SCD 10
5.647768 3.122154 13.466814 0 0 79.15995 0.9041096 0.1282860 WNT7A 10
5.534420 3.377337 13.419972 0 0 78.55696 0.9041096 0.1845426 MFAP5 10
5.526872 4.079079 12.339285 0 0 65.62422 0.9726027 0.2789169 ALDH1A3 10
5.460513 3.761994 14.160684 0 0 87.87904 0.9863014 0.2773396 ACAT2 10
5.406879 3.560699 12.870324 0 0 71.88712 0.9863014 0.2003155 DHCR7 10
5.603419 4.165234 14.436834 0 0 93.06120 0.9726027 0.3772345 LGALS1 11
5.492215 3.377337 13.536047 0 0 81.33709 0.8082192 0.1863828 MFAP5 11
5.198020 5.828861 9.256727 0 0 34.99960 0.9178082 0.4277077 KRT17 11
5.131380 3.419420 12.247042 0 0 65.69621 0.8082192 0.1887487 UCA1 11
5.087037 5.203586 10.890454 0 0 50.74487 0.8904110 0.4398002 TGM2 11
5.076277 3.721301 11.900766 0 0 61.72685 0.8356164 0.2360673 TAGLN 11
4.859095 8.297745 10.459261 0 0 46.30541 1.0000000 0.7549947 KRT7 11
4.743550 3.680273 13.298494 0 0 78.36388 0.8767123 0.2939012 C12orf75 11
4.446859 6.174009 9.620204 0 0 38.29411 0.9041096 0.5675605 KLK10 11
4.445222 3.162950 10.388417 0 0 45.70811 0.6849315 0.1280231 KRT6A 11
3.834264 5.177970 13.740327 0 0 81.99544 0.8771930 0.4492715 SLC1A5 12
3.775817 5.414029 13.205823 0 0 75.39135 0.9064327 0.4705882 MARS 12
3.747142 5.288900 13.673544 0 0 81.15467 0.9239766 0.4749056 GARS 12
3.577583 5.351104 12.688685 0 0 69.29353 0.8830409 0.4490016 AXL 12
3.534286 3.613653 13.709260 0 0 81.97845 0.7368421 0.2115488 FSTL3 12
3.498838 6.432474 11.706103 0 0 58.14422 0.9415205 0.5866163 LAMB3 12
3.483771 6.371894 14.713662 0 0 94.39811 0.9766082 0.7139773 HMGA1 12
3.408765 5.010256 12.685398 0 0 69.32909 0.8888889 0.4363195 DDX39A 12
3.394393 4.070904 15.139305 0 0 100.67303 0.8947368 0.3664328 EMP3 12
3.383319 4.434041 12.948795 0 0 72.53221 0.8070175 0.3505127 SHMT2 12
##             used   (Mb) gc trigger    (Mb) limit (Mb)   max used    (Mb)
## Ncells   5403100  288.6    8458888   451.8         NA    8458888   451.8
## Vcells 510217535 3892.7 1493200851 11392.3      16384 1493199592 11392.3

Heatmap

markers <- markers[rowMaxs(as.matrix(markers[,c("ratio1","ratio2")])) > 0.4,]
    
top10 <- markers %>% group_by(cluster) %>% top_n(10, logFC)
plot.data <- logcounts(sceset)[top10$gene, order(sceset$clincluster, decreasing = F)]

colanno <- data.frame (colData(sceset)[,c("clincluster","Patient2")])
colnames(colanno)[1] <- "clusters"
colanno$clusters <- factor(colanno$clusters)

rownames(colanno) <- colnames(sceset)
colanno <- colanno[order(colanno$clusters, decreasing = F),]
colanno$clusters <- factor(colanno$clusters, levels = unique(colanno$clusters))
plot.data <- plot.data[,match(rownames(colanno), colnames(plot.data))] 

plot.data <- t(scale(t(plot.data), center = T, scale = T))
plot.data <- Seurat::MinMax(plot.data, min = -2, max = 2)

plot.data<- as.data.frame(x = t(x = plot.data))
plot.data$cell <- rownames(x = plot.data)

cells.ident <- sceset$clincluster
names(x = cells.ident) <- sceset$Sample
 
colnames(x = plot.data) <- make.unique(names = colnames(x = plot.data))
plot.data %>% melt(id.vars = "cell") -> plot.data
names(x = plot.data)[names(x = plot.data) == 'variable'] <- 'gene'
names(x = plot.data)[names(x = plot.data) == 'value'] <- 'expression'
plot.data$ident <- cells.ident[plot.data$cell]

 plot.data$gene <- with(
    data = plot.data,
    expr = factor(x = gene, levels = rev(x = unique(x = plot.data$gene)))
  )
   plot.data$cell <- with(
    data = plot.data,
    expr = factor(x = cell, levels = unique(x = colnames(sceset)))
  )

my_colours <- colorRampPalette(c("steelblue4", "white", "firebrick2"))(200)

heatmap <- ggplot( data = plot.data, mapping = aes(x = cell, y = gene, fill = expression)) + geom_tile() +
    scale_fill_gradient2(
        # low = muted("blue"), mid = "white", high = muted("red")
        low = muted("steelblue4"), mid = "white",
        high = muted("firebrick2"),
      name= "Expression", guide = guide_colorbar(
        direction = "vertical",
        title.position = "top"
      )
    ) +
    scale_y_discrete(position = "right", labels = rev(top10$gene)) +
    theme(
      axis.line = element_blank(),
      axis.title.y = element_blank(),
      axis.ticks.y = element_blank(),
      strip.text.x = element_text(size = 15),
      axis.text.y = element_text(size = 6),
      axis.text.x = element_text(size = 6),
      axis.title.x = element_blank()
    )

heatmap <- heatmap +
      facet_grid(
        facets = ~ident,
        drop = TRUE,
        space = "free",
        scales = "free",
        switch = 'x'
      ) +
      scale_x_discrete(expand = c(0, 0), drop = TRUE)  +
      theme(
        axis.title.x = element_blank(),
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        axis.line = element_blank(),
        axis.title.y = element_blank(),
        axis.ticks.y = element_blank()
      )

panel.spacing <- unit(x = 0.15, units = 'lines')
heatmap <- heatmap +
      theme(strip.background = element_blank(), panel.spacing = panel.spacing)

heatmap

Save results

# saveRDS(sceset,"../rds/20180917Sceset_12clusters.rds", compress = T)
# write.csv(markers,"../tables/20180917sceset_markers_12cluster.csv", row.names = T)

Figures

Figure 1C

current.cluster.ids <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 ,12)
new.cluster.ids <- c("Leukocyte", 
                     "Cultured ciliated",
                     "O.N. cultured FTESCs",
                     "Cultured ciliated",
                     "O.N. cultured FTESCs",
                     "Stromal cells", 
                     "Fresh FTESCs",
                     "Fresh FTESCs",
                     "Fresh ciliated",
                     "Long cultured FTESCs 2", 
                     "Long cultured FTESCs 1", 
                     "O.N. cultured FTESCs")
sceset$ident <- plyr::mapvalues(x = sceset$clincluster, from = current.cluster.ids, to = new.cluster.ids)
sceset$ident <- factor(sceset$ident,
                       levels = c("Fresh FTESCs",
                                  "Fresh ciliated",
                                  "O.N. cultured FTESCs",
                                  "Cultured ciliated",
                                  "Long cultured FTESCs 1", 
                                  "Long cultured FTESCs 2", 
                                  "Leukocyte",
                                  "Stromal cells"))

sceset$sources <- sceset$source
sceset$sources[sceset$sources %in% c("2-day cultured", "6-day cultured")] <- "Long cultured" 
sceset$sources <- factor(sceset$sources, levels = c("Fresh","O.N. cultured","cryopreserved",
                                                    "Long cultured"))
plotTSNE(object = sceset,  colour_by = "ident")
## Warning: 'add_ticks' is deprecated.
## Use '+ geom_rug(...)' instead.

t-SNE plot profiles ~3,600 single-cell transcriptome from fallopian tubes coloured by patients

plotTSNE(sceset,  colour_by = "Patient2")
## Warning: 'add_ticks' is deprecated.
## Use '+ geom_rug(...)' instead.

# ggsave("../manuscript_plots/FigureS1A_tsne.tiff", res = 300, width = 7, height = 5.5, units = "in")

Expression plot of secretory markers PAX8, KART7, ciliated markers CAPS and CCDC17.

p1 <- plotTSNE(sceset,  colour_by = "EPCAM")
## Warning: 'add_ticks' is deprecated.
## Use '+ geom_rug(...)' instead.
p2 <- plotTSNE(sceset,  colour_by = "PTPRC")
## Warning: 'add_ticks' is deprecated.
## Use '+ geom_rug(...)' instead.
p3 <- plotTSNE(sceset,  colour_by = "PAX8")
## Warning: 'add_ticks' is deprecated.
## Use '+ geom_rug(...)' instead.
p4 <- plotTSNE(sceset,  colour_by = "KRT7")
## Warning: 'add_ticks' is deprecated.
## Use '+ geom_rug(...)' instead.
p5 <- plotTSNE(sceset,  colour_by = "CAPS")
## Warning: 'add_ticks' is deprecated.
## Use '+ geom_rug(...)' instead.
p6 <- plotTSNE(sceset,  colour_by = "CCDC17")
## Warning: 'add_ticks' is deprecated.
## Use '+ geom_rug(...)' instead.
p7 <-plotTSNE(sceset,  colour_by = "CCDC78")
## Warning: 'add_ticks' is deprecated.
## Use '+ geom_rug(...)' instead.
cowplot::plot_grid(p1,p2,p3,p4,p5,p6,p7, ncol=4)

Fresh cells
fresh <- sceset[,sceset$source == "Fresh"]

fresh$type <- "Secretory"
fresh$type[fresh$ident == "Fresh ciliated"] <- "Ciliated"
plotExpression(fresh, x = "type", features = c("KRT7","PAX8"), ncol = 5,xlab = "Cell type")  + theme(strip.text = element_text(size = 12, face = "italic") )

# ggsave("plots/SuppFig1H_secretory_markers.png", height = 2, width = 3.5)
plotExpression(fresh, x = "type", features = c("CCDC17","CCDC78","CAPS","FOXJ1"), ncol = 5,xlab = "Cell type")  + theme(strip.text = element_text(size = 12, face = "italic") )

# ggsave("../../revision_analysis_20190827/revision_plots/SI/SuppFig1I_ciliated_markers_foxj1ADDED.png", height = 2, width = 5.7)
# saveRDS(sceset,"../rds/sce_cancerFT_clustered.rds")

Technical

sessionInfo()
## R version 3.5.2 (2018-12-20)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.4
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] bindrcpp_0.2.2              cowplot_0.9.4              
##  [3] knitr_1.21                  reshape2_1.4.3             
##  [5] scales_1.0.0                dplyr_0.7.8                
##  [7] edgeR_3.24.3                limma_3.38.3               
##  [9] scater_1.10.1               ggplot2_3.2.0.9000         
## [11] SingleCellExperiment_1.4.1  SummarizedExperiment_1.12.0
## [13] DelayedArray_0.8.0          BiocParallel_1.16.5        
## [15] matrixStats_0.54.0          Biobase_2.42.0             
## [17] GenomicRanges_1.34.0        GenomeInfoDb_1.18.1        
## [19] IRanges_2.16.0              S4Vectors_0.20.1           
## [21] BiocGenerics_0.28.0        
## 
## loaded via a namespace (and not attached):
##   [1] Seurat_3.0.2             Rtsne_0.15              
##   [3] ggbeeswarm_0.6.0         colorspace_1.4-0        
##   [5] ggridges_0.5.1           XVector_0.22.0          
##   [7] listenv_0.7.0            npsurv_0.4-0            
##   [9] ggrepel_0.8.0            codetools_0.2-16        
##  [11] splines_3.5.2            R.methodsS3_1.7.1       
##  [13] lsei_1.2-0               jsonlite_1.6            
##  [15] ica_1.0-2                cluster_2.0.7-1         
##  [17] png_0.1-7                R.oo_1.22.0             
##  [19] sctransform_0.2.0        HDF5Array_1.10.1        
##  [21] compiler_3.5.2           httr_1.4.0              
##  [23] assertthat_0.2.0         Matrix_1.2-15           
##  [25] lazyeval_0.2.1           htmltools_0.3.6         
##  [27] tools_3.5.2              rsvd_1.0.2              
##  [29] igraph_1.2.3             gtable_0.2.0            
##  [31] glue_1.3.0               GenomeInfoDbData_1.2.0  
##  [33] RANN_2.6.1               Rcpp_1.0.0              
##  [35] gdata_2.18.0             ape_5.2                 
##  [37] nlme_3.1-137             DelayedMatrixStats_1.4.0
##  [39] gbRd_0.4-11              lmtest_0.9-36           
##  [41] xfun_0.4                 stringr_1.4.0           
##  [43] globals_0.12.4           irlba_2.3.3             
##  [45] kknn_1.3.1               gtools_3.8.1            
##  [47] future_1.14.0            zlibbioc_1.28.0         
##  [49] MASS_7.3-51.1            zoo_1.8-4               
##  [51] rhdf5_2.26.2             RColorBrewer_1.1-2      
##  [53] yaml_2.2.0               reticulate_1.10         
##  [55] pbapply_1.4-0            gridExtra_2.3           
##  [57] stringi_1.2.4            highr_0.7               
##  [59] caTools_1.17.1.1         bibtex_0.4.2            
##  [61] Rdpack_0.10-1            SDMTools_1.1-221        
##  [63] rlang_0.4.0              pkgconfig_2.0.2         
##  [65] bitops_1.0-6             evaluate_0.13           
##  [67] lattice_0.20-38          ROCR_1.0-7              
##  [69] purrr_0.3.0              Rhdf5lib_1.4.2          
##  [71] bindr_0.1.1              htmlwidgets_1.3         
##  [73] labeling_0.3             tidyselect_0.2.5        
##  [75] plyr_1.8.4               magrittr_1.5            
##  [77] R6_2.3.0                 gplots_3.0.1.1          
##  [79] pillar_1.3.1             withr_2.1.2             
##  [81] fitdistrplus_1.0-14      survival_2.43-3         
##  [83] RCurl_1.95-4.11          tsne_0.1-3              
##  [85] tibble_2.0.1             future.apply_1.3.0      
##  [87] crayon_1.3.4             KernSmooth_2.23-15      
##  [89] plotly_4.8.0             rmarkdown_1.11          
##  [91] viridis_0.5.1            locfit_1.5-9.1          
##  [93] grid_3.5.2               data.table_1.12.0       
##  [95] metap_1.1                digest_0.6.18           
##  [97] tidyr_0.8.2              R.utils_2.7.0           
##  [99] munsell_0.5.0            beeswarm_0.2.3          
## [101] viridisLite_0.3.0        vipor_0.4.5

R version 3.5.2 (2018-12-20) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Mojave 10.14.4

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] grid parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] modes_0.7.0 ROCR_1.0-7 gplots_3.0.1.1 KernSmooth_2.23-15 fields_9.6 maps_3.3.0
[7] spam_2.2-2 dotCall64_1.0-0 DoubletFinder_2.0.1 Seurat_2.3.4 Matrix_1.2-15 cowplot_0.9.4
[13] bindrcpp_0.2.2 reshape2_1.4.3 scales_1.0.0 dplyr_0.7.8 edgeR_3.24.3 limma_3.38.3
[19] scater_1.10.1 ggplot2_3.1.0 SingleCellExperiment_1.4.1 SummarizedExperiment_1.12.0 DelayedArray_0.8.0 BiocParallel_1.16.5
[25] matrixStats_0.54.0 Biobase_2.42.0 GenomicRanges_1.34.0 GenomeInfoDb_1.18.1 IRanges_2.16.0 S4Vectors_0.20.1
[31] BiocGenerics_0.28.0

loaded via a namespace (and not attached): [1] reticulate_1.10 R.utils_2.7.0 tidyselect_0.2.5 htmlwidgets_1.3 trimcluster_0.1-2.1 Rtsne_0.15 devtools_2.0.1
[8] munsell_0.5.0 codetools_0.2-16 ica_1.0-2 statmod_1.4.30 scran_1.10.2 umap_0.1.0.3 withr_2.1.2
[15] colorspace_1.4-0 knitr_1.21 rstudioapi_0.9.0 robustbase_0.93-3 dtw_1.20-1 gbRd_0.4-11 Rdpack_0.10-1
[22] labeling_0.3 lars_1.2 GenomeInfoDbData_1.2.0 pheatmap_1.0.12 bit64_0.9-7 rhdf5_2.26.2 rprojroot_1.3-2
[29] xfun_0.4 diptest_0.75-7 R6_2.3.0 ggbeeswarm_0.6.0 locfit_1.5-9.1 hdf5r_1.0.1 flexmix_2.3-14
[36] bitops_1.0-6 assertthat_0.2.0 SDMTools_1.1-221 nnet_7.3-12 beeswarm_0.2.3 gtable_0.2.0 npsurv_0.4-0
[43] processx_3.2.1 rlang_0.3.1 splines_3.5.2 lazyeval_0.2.1 acepack_1.4.1 checkmate_1.9.1 yaml_2.2.0
[50] backports_1.1.3 Hmisc_4.2-0 tools_3.5.2 usethis_1.4.0 RColorBrewer_1.1-2 proxy_0.4-22 dynamicTreeCut_1.63-1
[57] sessioninfo_1.1.1 ggridges_0.5.1 kknn_1.3.1 Rcpp_1.0.0 plyr_1.8.4 base64enc_0.1-3 zlibbioc_1.28.0
[64] purrr_0.3.0 RCurl_1.95-4.11 ps_1.3.0 prettyunits_1.0.2 rpart_4.1-13 pbapply_1.4-0 viridis_0.5.1
[71] zoo_1.8-4 cluster_2.0.7-1 fs_1.2.6 magrittr_1.5 data.table_1.12.0 lmtest_0.9-36 RANN_2.6.1
[78] mvtnorm_1.0-8 fitdistrplus_1.0-14 pkgload_1.0.2 evaluate_0.13 lsei_1.2-0 mclust_5.4.2 gridExtra_2.3
[85] compiler_3.5.2 tibble_2.0.1 crayon_1.3.4 R.oo_1.22.0 htmltools_0.3.6 segmented_0.5-3.0 Formula_1.2-3
[92] snow_0.4-3 tidyr_0.8.2 MASS_7.3-51.1 fpc_2.1-11.1 cli_1.0.1 R.methodsS3_1.7.1 gdata_2.18.0
[99] metap_1.1 bindr_0.1.1 igraph_1.2.3 pkgconfig_2.0.2 foreign_0.8-71 foreach_1.4.4 vipor_0.4.5
[106] XVector_0.22.0 bibtex_0.4.2 stringr_1.4.0 callr_3.1.1 digest_0.6.18 tsne_0.1-3 rmarkdown_1.11
[113] htmlTable_1.13.1 DelayedMatrixStats_1.4.0 curl_3.3 kernlab_0.9-27 gtools_3.8.1 modeltools_0.2-22 nlme_3.1-137
[120] jsonlite_1.6 Rhdf5lib_1.4.2 BiocNeighbors_1.0.0 desc_1.2.0 viridisLite_0.3.0 pillar_1.3.1 lattice_0.20-38
[127] httr_1.4.0 DEoptimR_1.0-8 pkgbuild_1.0.2 survival_2.43-3 glue_1.3.0 remotes_2.0.2 png_0.1-7
[134] prabclus_2.2-7 iterators_1.0.10 bit_1.1-14 class_7.3-15 stringi_1.2.4 HDF5Array_1.10.1 mixtools_1.1.0
[141] doSNOW_1.0.16 latticeExtra_0.6-28 caTools_1.17.1.1 memoise_1.1.0 irlba_2.3.3 ape_5.2